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Abstract 


A common approach to matching two scenes is 
the use of an area cross-correlation measure, 
This measure is proportional to the root-mean- 
square difference between the corresponding pixel 
intensities for the two scenes. In practice, itis 
often found that the magnitude of this measure at 
the correct match position may be rather low, and 
that the maximum value appears at an incorrect 
match position. This problem of false-fixing 
results from differences between the reference 
and sensed scene in their respective range, view- 
ing angle, illumination, contrast level, and imag- 
ing sensor type. In this paper we present a com- 
parison and trade-offs between correlation and 
alternative matching techniques that do not suffer 
from the same weaknesses as those of correlation. 
A specific example of a matching scheme is also 
discussed that uses a scene model based on local 
structural features from the scene. The results 
of such a matching scheme on a set of two repre- 
sentative images are presented. 


I. Introduction 


The problem of matching two scenes (refer- 
ence and sensed) is a classical one, and of 
importance in many types of applications in the 
field of image processing. Matching is a trivial 
task if the sensed image or a part of it is an exact 
copy of the template, or the reference scene 
being matched. Unfortunately, an exact duplicate 
is usually very difficult to obtain. In practice, the 
sensed image may be noisy, distorted, or 
obtained from a different imaging sensor and from 
a different perspective or range than that of the 
reference image. Thus at best, one can hope to 
maximize a measure of the degree of match 
between the two scenes. 


Area cross-correlation is frequently used for 
scene matching and is briefly discussed in Section 
I. This matching technique has some inherent 
limitations and due to these limitations eroneous 
matching results are often obtained in practical 
applications. Some basic concepts for alternative 
matching schemes that do not suffer from the 





correlation difficulties are presented in Section 
HI. These techniques generally use a simplified 
scene representation or model for the matching 
process. The image feature characteristics nec- 
essary to build a scene model are discussed in 
Section IV. A specific example of a matching 
scheme that uses a simple scene model based on 
vertices is given in Section V, The results of this 
vertex-based model matching on a set of two 
representative images are also presented. 


Il. Area Cross-Correlation 


A common approach to matching two scenes 
is the use of an area cross-correlation measure. 
This measure, the cross-correlation coefficient, 
is proportional to the root-mean-square differ- 
ence between the corresponding pixel intensities 
for the two scenes. The cross-correlation coef- 
ficient also measures the common energy between 
the two scenes, as a function of the position of the 
smaller scene (generally the reference scene) 
within the bounds of the larger scene (generally 
the sensed scene). In this matching process, it 
is assumed that the maximum value of the cor- 
relation coefficient will appear at the correct 
match point, and that the corresponding coordi- 
nates of the match relate the position of reference 
scene in the sensed image. 


It is often found in practice that the magnitude 
of this correlation coefficient at the correct 
match position is lower than expected, and that 
the maximum value appears at an incorrect match 
position. This effect is commonly known as 
false-fixing. This adverse result may occur even 
if both the sensed and reference images contain an 
amount of unique geometrical signature normally 
considered adequate for reliable correlation. The 
false-fixing in this case may occur due to 
improper image point registration and differences 
in contrast levels between the sensed and refer- 
ence images. 


The problem of improper image registration 
can be handled by an image transformation 
process. However, the prediction of differences 
in contrast levels is very difficult as it requires 
a knowledge of the performance of sensors in 


*Now with the Defense Advanced Research Projects Agency. 


Sponsored by Defense Advanced Research Projects Agency, Contract F30602-76-C-0074. 


kd 


774 


mapping a wide variety of scenes. Furthermore, 
the contrast levels derived by the imaging sensor 
may vary as a function of weather conditions, 
season of the year, day versus night operation, 
and aspect angle. 


The use of correlation based matching also 
presents difficulties in other situations. These 
difficulties may arise due to differences in the two 
images in terms of their range (scale), viewing 
angle, context (missing or additional parts in 
image), imaging sensor type, etc. For example, 
the sensors for the two images operate at different 
wavelengths, with different resolutions, or one 
sensor may be active and the other passive. In 
order to successfully use correlation in such 
situations, a transformation on the available 
reference scene is required to compensate for 
some of the differences between the two scenes 
being matched. It should be noted that in practice, 
in addition to an enormous computations] power 
required, it is sometimes difficult to implement 
the transformations needed on the reference 
scene. Also, in some of these situations, mul- 
tiple references may be stored to compensate for 
differences such as viewing angles and context. In 
this case, the memory requirements for storing 
the required number of references may be 
enormous. 


II. Alternative Matching Schemes 


It was seen in the last section that the match- 
ing process using correlation often fails due to 
certain differences between the reference and 
sensed images. In order for a matching scheme 
to be insensitive to a greater degree to these dif- 
ferences, it must use invariant features based on 
some geometric distribution data or description. 
The effect of changes due to range, viewing angle, 
illumination, sensor type, etc. on the geometric 
features is not as dramatic and, therefore, the 
use of these in a matching scheme avoids the dif- 
ficulties present in correlation techniques. 


A collection of geometric features and their 
relationship between them is called a scene model. 
The general form of a model is a data structure 
composed of a collection of feature descriptors, 
their placement and relationship in the scene. All 
of this information can be represented compactly 
by a mathematical representation called a graph, 
consisting of nodes and links between the nodes. 
The feature names become labels on the nodes, 
and their relationships become labels on the links 
in the graph. The class of techniques which use 
such model representaticns for reference and 
sensed images for matching purposes is known as 
the model-based scene matching. 


Model-based matching schemes use geometric 
features in a flexible manner to deal with a wide 
range of image differences. This process allows 
a convenient way of incorporating information 
about contextual and other image variations. It 
should be noted that in case of a large difference 
in viewing angles, it may still be necessary in 
this matching technique to store multiple refer- 


ence images. However, the relerence generation 
> 


process in this case is much simpler and needs 
greatly reduced memory requirements. This is 
because only the information describing the image 
model needs to be generated and stored rather 
than the entire grey level image. One other 
extremely useful characteristic of model-based 
matching is that it is possible to find a partial 
match between the reference and sensed images in 
situations where only a small part of the refer- 
ence image has a match with a certain part of the 
sensed image. It can be extremely difficult to 
find a partial match of this kind through a cor- 
relation matching approach. 


The next section discusses the characteris- 
tics of different classes of image features useful 
for building a scene model. 


IV. Image Features for a Scene Model 


A careful examination of image features 
shows that they may be divided into three general 
categories: point, local, and global types. Fig- 
ure 1 shows an example of a simple scene and its 
representation using the three types of image 
features. The trade-offs between these features 
are discussed below. 
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Figure l. Scene representation through 
the point, local, and global types 
of features. 


Point Features 


Point features are used to represent a scene 
as a matrix of point values for every resolution 
element or pixel in the image. The point value 
for each pixel represents either the intensity mag- 
magnitude (a function of the reflectivity and 
emissivity) or the range from the sensor to the 
point. Figure | shows a matrix of values which 
represent the pixel intensities. An advantage of 
point measures is that they are usually available 
directly from the image sensors with little addi- 
tional processing required for their extraction. 
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At the same time, the major disadvantage of point 
feature measures is that they have poor invariance 
characteristics. For example, they can vary 
widely with small changes in illumination and 
contrast levels. It is sometimes possible to per- 
form transformations on the point feature data to 
overcome the lack of invariance, but as a rule, 
these transformations are computationally very 
complex. 
relation matching techniques usually use point 
feature image representations. 


Local Features 


Local feature measures include average 
intensity over an area, texture properties over an 
area, locally connected line segments and curves, 
and line and curve intersections. Features based 
onlocal measures have greatly improved invari- 
ance characteristics compared to point measures. 
These invariance characteristics arise from local 
averaging and the use of relative measures, as in 
the detection of edges. The presence of a line 
segment for example will not change for a wide 
range of illumination and viewing angle changes, 
even though the absolute values of the point fea- 
tures producing the gradient may shift 
dramatically. 


The point features represent an image exactly, 
although with little invariance or data compres sion 
efficiency. Local features on the other hand, 
represent an image in an abbreviated or abstract 
manner. The relative positions and orientations 
of bright line segments and line intersections, for 
example, may be sufficient to specify an object's 
shape. A model using local features for image 
matching has the advantage of greater invariance 
to image differences and a smaller memory 
requirement compared to that for point feature 
matching. An example of a local feature model is 
also shown in Figure l. In this example the local 
features are corners. ; 


Global Features 


The last category of features for image 
matching are global features. These include 
entire surfaces, shapes, and objects that have 
been segmented or extracted from an image. A 
global representation or model for a building 
might consist of several rectangles connected in 
a specific way. A trivial global representation is 
shown in Figure 1. . 


The global features have a high degree of 
invariance to image differences. Unfortunately, 
however, the global features are the most diffi- 
cult to successfully extract. This is because they 
depend on the segmentation of complete, intact 
regions or surfaces from the scene. Use of fea- 
tures based on a partial or incomplete segmenta- 
tion may give incorrect results, and thus a 
complete and a near perfect scene segmentation is 
essential. Note that these requirements are 
relaxed in the case of local features because use 
of partially extracted local features can give 
correct results. 


It should be noted that the traditional cor- 


The results of the above discussion on image 
features has been summarized in Table I. From 
this table, it is seen that the point features suffer 
from the poor invariance characteristics and the 
global features pose an exceedingly difficult 
extraction problem. The remaining alternative is 
the use of local features for building a scene 
model. While not as invariant to image difference 
as global features, they can be extracted with 
relative ease and have many invariance and stor- 
age advantages over point features. In the next 
section, a scene model which uses local features 
is described. 


Table I. Comparison of the sensitivity of 
image features to differences between 
reference and sensor images, and the 

degree of processing difficulty 
encountered in correcting for 
these differences and in 
extracting the image 
features 


TRANSFORMS TO 
CORRECT FOR 
INVARIANCE 
ERRORS 


INVARIANCE | EXTRACTION 


CATEGORY RANK DIFFICULTY 


POINT FEATURES | POOR 
LOCAL FEATURES} GOOD 


TRIVIAL 
MODERATE 


DIFFICULT 


NOT ALWAYS 
NECESSARY 


EXCEEDINGLY NOT 
DIFFICULT NECESSARY 


GLOBAL 
FEATURES 


EXCELLENT 





V. A Vertex-Based Scene Matching Scheme 


In this section, we present an alternative 
scene matching scheme which makes use of a 
model based on the local structural features 
present in the scene. At this stage in our 
research program, we are using quite simple 
scene features, namely, straight lines to con- 
struct our vertex-based scene model. The 
straight line features in a scene are quite robust, 
as they are insensitive, to a large degree, to a 
wide range of changes in illumination, viewing 
angle and to the use of different types of imaging 
sensors. Also the storage required for a refer- 
ence model is minimal compared to that for a 
cross-correlation scheme, The reference vertex 
model is derived ahead of time from the given 
reference image. It consists of a data structure 
representing the vertices and their interconnec- 
tions in the scene. A vertex-based model and a 
technique for matching two such models is 
described below. 


A Vertex-Based Scene Model 


The vertex model used in this matching 
scheme is based on the prominent straight line 
edge-segments present in the scene. The first 
step in this process of extracting edge-segments 
consists of performing edge detection at every 
pixel in the given grey level picture. The posi- 
tion and direction of the edge-elements present 


in the picture are o 


btained through this step.’ An 


edge-segment as defined here consists of a num- 
ber of edge-elements which lie on the same 
straight line. The process of grouping the edge- 
elements into edge-segments is accomplished 
through the use of Hough transformation* and a 
local clustering algorithm. A straight line tracing 
technique is used to locate and verify the existence 
of edge-segments in the picture. A detailed 
description of the complete process of edge- 


segment extraction 
and Luk.° 


is found in a paper by Dudani 


The straight line edge-segments obtained in 
this manner are then used to locate the vertices in 
the scene. All possible intersections for the 
extracted straight lines are first determined. It 


is then checked to 5 


ee if an intersection is real 


(does it actually exist in the scene) or virtual. 


Only the real inters 


ections are considered. A 


method is used to combine real intersections 
lying in a close group to forma vertex. The 


degree of a vertex i 


s defined as the number of 


lines originating from it. A value of 1 for the 


degree of a vertex i 
of a straight line an 


ndicates that it is the end point 
d that no other lines originate 
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from it. The vertex data obtained may not nec- 
essarily form a single graph, i.e., it is quite 
valid to have isolated lines or groups of lines. A 
vertex-based model thus consists of a list of 
vertices and their interconnections. 


An example of the complete process of obtain- 
ing the vertex model from a given scene is shown 
in Figure 2. 


Model Matching Process 


The vertex model for the reference scene is 
determined ahead of time and stored. In arriving 
at the reference model, only the most prominent 
and strong line and vertex features are chosen to 
ensure that most of the reference model will be 
visible and easily extractable from the test scene. 
One can thus include (perhaps interactively) the 
particular application goal-oriented information 
in building the reference model. Generally, a 
larger number of vertices are present in the test 
model than in the reference model, due to the 
lack of an opportunity to select only the most 
prominent and strong features. Figure 3 shows 
an example of reference and test scenes along 
with their vertex models. 





(b}. HUECKEL EDGE POINTS 





{d} VERTEX MODEL 


An example of an extraction of a vertex-based scene model. 


The vertex model is stored as a data structure 
which is used during the matching process. The 
matching scheme is based on a comparison of 
the local structure of the reference nodes to that 
of the test nodes. The structure of a node is 
defined by the number of links originating from it, 
the angles subtended by the links and the length 
ratios. A detailed description of this matching | 
scheme is found in a paper by Dudani and Clark.’ 
An earlier matching process based on the use of 
the minimal spanning tree is found in the paper 
by Bullock, et al.’ The two images, reference 
and test, shown in Figure 3, were obtained from 
two different perspective views. It should be 
noted that some portions of the reference imagery 
is not present in the test imagery, and that the 
two images do not have just a simple linear 
transformation between them. The model match- 
ing results on these two images are given in 
Table II. This table gives the reference vertices 
and their corresponding match from the test 
vertices. Note that no match was found for the 
two reference nodes 4 and 5, as they do not exist 
in the test scene due to the aspect angle change. 
However, correct matches were found for all of 
the other remaining reference nodes. 
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(c) REFERENCE VERTEX MODEL 


Figure 3. 


VI. Summary 


Area cross-correlation and model-based 
scene matching techniques were considered in 
this paper. It was pointed out that the traditional 
correlation techniques fail in many practical 
applications from differences between the refer- 
ence and sensed scenes in their range, viewing 
angle, illumination, contrast level and imaging 
sensor type. It was suggested that in order fora 
matching technique to be insensitive to a greater 
degree to these differences, it must use invariant 
features based on some geometric distribution 
data or description. A collection of geometric 
features and their relationship between them were 
used to define a model to represent a given scene. 
Comparison was given for the use of point, local 
and global types of features in constructing a 
scene model. It was shown that local feature 
types were best suited for a scene model because 
of their invariant characteristics anda relatively 
easy extraction process. A specific example of 
matching two scenes based on a vertex model was 
also presented. 





(b) TEST SCENE 
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(d) TEST VERTEX MODEL 


The reference and test scenes along with their vertex models. 
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